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Abstract 

A series of experiments to validate the Coding theorem method to evaluate 
Kolmogorov complexity based on approximations to Levin's Universal Dis- 
tribution are presented, showing that the measure is stable in the face of 
changes in computational formalism and that results are in agreement with 
the results obtained using lossless compression algorithms. We found that 
strings which are more random according to the Coding theorem method also 
turn out to be less compressible, while less random strings are clearly more 
compressible. We also introduce a Block Matrix Decomposition technique 
with an application to classification of space-time evolutions of cellular au- 
tomata, providing further evidence of the soundness and readiness of the 
Coding theorem method as an alternative and complementation to compres- 
sion algorithms for approximating Kolmogorov complexity, specially for small 
objects (e.g. short strings and small images) where lossless compression al- 
gorithms fail. 
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1. Introduction 



The way to approach the algorithmic complexity of a string has been by 
using lossless compression algorithms. The result of a lossless compression 
algorithm is an upper bound of its algorithmic complexity. Short strings, 
however, are difficult to compress in practice, and the theory does not provide 
a satisfactory solution to the problem of the instability of the measure for 
short strings. 

The chief advantage of lossless compression algorithms is that they are 
a sufficient test of non-randomness. However, for short strings, which are 
usually the ones useful in practical applications, adding the decompression 
instructions to the compressed version makes the compressed string often, if 
not always, longer than the string itself, simply because the decompression 
instructions are at least equal in length to the original object. 

Compression algorithms have also proven to be signally applicable in 
several domains (see e.g. [22]), yielding surprising results as a method for 
approximating Kolmogorov complexity. Hence their success is in part a mat- 
ter of their usefulness. Here we show that the Coding theorem method yields 
similar results, and that it is actually compatible with the behavior of lossless 
compression. For this experiment we devised an artful technique by grouping 
strings that our method indicated had the same program-size complexity, in 
order to construct files of concatenated strings of the same complexity (while 
avoiding repetition, which could easily be exploited by compression). Then 
a lossless general compression algorithm was used to compress the files and 
ascertain whether the files that were more compressed were the ones cre- 
ated with highly complex strings according to our method. Similarly, files 
with low Kolmogorov complexity were tested to determine whether they were 
better compressed. This was indeed the case, and we report these results in 
Section [7j In Subsection 7.2 we also show that the Coding theorem method 
yields a very similar classification of the space-time diagrams of Elementary 
Cellular Automata, despite the disadvantage of having used a limited sam- 
ple of a Universal Distribution. In all cases the statistical evidence is strong 
enough to suggest that the Coding theorem method is sound and capable 
of producing satisfactory results. The Coding theorem method also repre- 
sents the only currently available method for dealing with very short strings 
and in a sense is an expensive but powerful "microscope" for capturing the 
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information content of very small objects. 

2. Kolmogorov-Chaitin complexity 

Central to algorithmic information theory (AIT) is the definition of algo- 
rithmic (Kolmogorov-Chaitin or program-size) complexity |20| |6]: 

Kt{s) = mm{\p\,T{p) = s} (1) 

That is, the length of the shortest program p that outputs the string s 
running on a universal Turing machine T. The measure was first conceived to 
define randomness and is today the accepted objective and universal math- 
ematical measure of randomness, among other reasons because it has been 
proven to be mathematically robust (by virtue of the fact that several in- 
dependent definitions converge in it), although it is worth noting that there 
are important differences between infinite and finite randomness that we will 
not discuss here (among other reasons because we are only concerned with 
finite randomness in this paper). 

A classic example is a string composed of an alternation of bits, such as 
(01)"', which can be described as "ra repetitions of 01". This repetitive string 
can grow fast while its description will only grow by about log2(?T-). On the 
other hand, a random-looking string such as 011001011010110101 may not 
have a much shorter description than itself. 

2.1. Uncomputability and instability of K 

A technical inconvenience of as a function taking s to the length of the 
shortest program that produces s is its uncomputability [6j . In other words, 
there is no program which takes a string s as input and produces the integer 
K{s) as output. This is usually considered a major problem, but one ought 
to expect a universal measure of complexity to have such a property. On 
the other hand, K is more precisely upper semi-computable, meaning that 
one can find upper bounds, as we will do by applying a technique based on 
another semi-computable measure to be presented in the next section. 

The invariance theorem guarantees that complexity values will only 
diverge by a constant c (e.g. the length of a compiler, a translation program 
between Ui and U2) and that they will converge at the limit. 
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Invariance Theorem ( [U [22] ) : If Ui and U2 are two universal Turing ma- 
chines and Ku^{s) and Ku^[s) the algorithmic complexity of s for f/i and f/2, 
there exists a constant c such that: 



(2) 



Hence the longer the string, the less important c is (i.e. the choice of 
programming language or universal Turing machine). However, in practice c 
can be arbitrarily large, thus having a very great impact on short strings. 

Lossless compression algorithms have been used to approximate the Kol- 
mogorov complexity of an object (e.g. a string). However, if the string is 
shorter than, for example, the size of the decompression algorithm, there will 
not be a way to compress the string into something shorter still. The result 
will be so dependent on the size of the decompression algorithm that the final 
value of the compressed length will be too unstable under different lossless 
compression/decompression algorithms. 

3. Solomonoff-Levin Algorithmic Probability 

The algorithmic probability (also known as Levin's semi-measure) of a 
string s is a measure that describes the expected probability of a random 
program p running on a universal (prefix-freaM) Turing machine T producing 



That is, the sum over all the programs for which T with p outputs s and 
halts. 

Levin's semi-measur^m(s) defines a distribution known as the Universal 
Distribution (a beautiful introduction is given in [IS]). It is important to 
notice that the value of m{s) is dominated by the length of the smallest 
program p (when the denominator is larger). However, the length of the 
smallest p that produces the string s is K{s). The semi-measure m(s) is 



^The group of valid programs forms a prefix- free set (no element is a prefix of any 
other, a property necessary to keep < m(s) < 1.) For details see 

^It is called a semi measure because the sum is never 1, unlike probability measures. 
This is due to the Turing machines that never halt. 



s. Formally [271 [HIE], 




(3) 
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therefore also uncomputable, because for every s, m(s) requires the calcu- 
lation of 2~-^^^\ involving K, which is itself uncomputable. An alternative 
to the traditional use of compression algorithms is the use of the concept of 
algorithmic probability to calculate K{s) by means of the following theorem. 

Coding Theorem (Levin [21J): 



An informal interpretation is that if a string has many long descriptions 
it also has a short one. It beautifully connects frequency to complexity, 
more specifically the frequency of occurrence of a string with its algorithmic 
(Kolmogorov) complexity. The Coding theorem implies that [TOl H] one can 
calculate the Kolmogorov complexity of a string from its frequency [T^ 
1221 [E], simply rewriting the formula as: 



That means that K and m are in perfect anti-correlation and after all 
even if they may serve for different purposes they are essentially the same 
complexity measure (unlike, for example, Bennett's Logical Depth pQ as we 
have shown in [26]). An important property of m as a semi- measure is that 
it dominates any other effective semi-measure /i, because there is a constant 
Cfj, such that for all s, m{s) > c^^{s). For this reason m{s) is often called a 
Universal Distribution |il9j . 

4. The Coding theorem method 

One can attempt to approximate m(s) by running every Turing ma- 
chine following a particular enumeration. A "natural" one is a quasi- 
lexicographical order from shorter to longer machine length in number of 
n states and m symbols (denoted by {n,m)) and therefore in number of 
transition rules. It is clear that in this fashion once a machine produces s for 
the first time, one can directly calculate an exact value of K. Because this 
is the length of the first Turing machine in the enumeration of programs of 
increasing size that produces s, there is no shorter machine producing s, and 
based on Turing universality we know there is a machine T G {n, m) that 
produces s. 




(4) 




(5) 
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Let's now formalize a function D{n,m) as previously done in 
approximation to m{s) as follows: 

Din m)(s) = \{T(^in,m):Tip) = s}\ 

Where T{p) is the Turing machine with number p (and empty input) that 
produces s upon halting, and \A\ is, in this case, the cardinality of the set 
A. We have previously proved |32l HI] that the function {n,m) — )■ D{n,m) 
is non-computable by reduction to the halting problem. However, D{n,m) 
is lower semi-computable, meaning it can be computably approximated from 
below, for example, by running small Turing machines. On a previous oc- 
casion [321 [H] we calculated the full output distribution of Turing machines 
with 2-symbols and n = 1, . . . , 4 states for which the Busy Beaver [21] values 
are known, in order to determine the halting time. That is, a total of 36, 
10 000, 7529 536 and 11019 960 576 Turing machines respectively. 

We have claimed before that because there are a large enough number of 
machines to run even for a small number of machine states (n), obtaining 
a Universal Distribution and applying the Coding theorem provides a fine 
and stable alternative to the evaluation of K{s) based on the frequency of 
production of Turing machines. 

In [2S], we have shown that the Coding theorem method with real- valued 
numbers is compatible with strict program-size length given in integer values 
and does not correlate with another algorithmic measure of complexity (Ben- 
nett's Logical Depth ^J) which is what is expected from 2 different measures 
of complexity given that Kolmogorov complexity is identified as character- 
izing random complexity while Logical Depth is identified as characterizing 
organized complexity). To the authors knowledge this result associated to 
Logical Depth is one of the few results that says something practical (an ex- 
act numerical evaluation) about the concept of Logical Depth, together with 
the work reported in [30] . 

Here we report the results of an experiment with Turing machines that 
are capable of writing on a 2-dimensional tape (by simply moving up and 
down in addition to the traditional left and right movements of the head). A 
reference to this kind of investigation and definition of 2D Turing machines 



can be found in [28] . In Section 7.1, we will use this variation of the Turing 
machine to provide evidence that Kolmogorov complexity evaluated through 
this means (the Coding theorem method), is consistent with the other pow- 
erful (and today only) method for approximating K, namely lossless com- 
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pression algorithms. We will do this in an artful way, given that compres- 
sion algorithms are unable to compress strings that are too short, which are 
the strings covered by our method. This will involve concatenating strings 
for which our method establishes a Kolmogorov complexity, which then are 
given to a lossless compression algorithm in order to determine whether it 
provides consistent estimations, that is, to determine whether strings are 
less compressible where our method says that they have greater Kolmogorov 
complexity and whether strings are more compressible where our method 
says they have lower Kolmogorov complexity. We provide evidence that this 
is actually the case. 



Moreover, in Section 7.2 we will apply the results from the Coding the- 
orem method to approximate the Kolmogorov complexity of 2-dimensional 
evolutions of 1-dimensional, closest neighbor Cellular Automata as defined 
in [28], and by way of offering a contrast to the approximation provided by a 
general lossless compression algorithm (Deflate). As we will see, in all these 
experiments we provide evidence that the method is just as successful as 
compression algorithms, but unlike the latter, it can deal with short strings. 

4.I. Deterministic 2-dimensional Turing machines 

2-dimensional (2D) Turing machines are so characterized because they 
run not on a 1-dimensional tape but in a 2-dimensional infinite array. At 
each step they can move in four different directions {up, down, left, right) 
or stop. Transitions have the format {?T,i,mi} — )■ {n2,m2,d}, meaning that 
when the machine is in state ni and reads symbols mi, it writes m2, changes 
to state n2 and moves to a contiguous cell following direction d. If n2 is the 
halting state then d is stop. In other cases, d can be any of the other four 
directions. 

Let (n, m)2D be the set of Turing machines with n states and m symbols. 
These machines have nm entries in the transition table, and for each entry 
{ni,mi} there are 4nm + m possible instructions, that is, m different halt- 
ing instructions (writing one of the different symbols) and 4nm non-halting 
instructions (4 directions, n states and m different symbols). So the number 
of machines in (n, m)2D is [Anm + m)"'"^ . It is possible to enumerate all these 
machines in the same way as ID Turing machines (e.g. as has been done 
in [28] and [H]). We can assign one number to each entry in the transi- 
tion table. These numbers go from to 4nm + m — 1 (given that there are 
Anm + m, different instructions). The numbers corresponding to all entries in 
the transition table (irrespective of the convention followed in sorting them) 
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form a number with nm digits in base 4nm + m. Then, the translation of 
a transition table to a natural number and vice versa can be done through 
elementary arithmetical operations. 

We take as output for a 2D Turing machine the minimal array that in- 
cludes all cells visited by the machine. Note that this probably includes cells 
that have not been visited, but it is the more natural way of producing output 
with some regular format and at the same time reducing the set of different 
outputs. 



{1,1} ^ {0,0, stop} StepO: 
{1,0} ^ {3,l,rtght} 
{2,l}^{3,l,wp} 

{2,0} ^{0,1, stop} '^'^P'- 

{3,1} {O,0,down} 

{3,0} ^ {2,0 Jeft} step 2: 



Figure 1: Example of a deterministic 2-dimensional Turing machine 

Fig. [T] shows an example of the transition table of a Turing machine in 
(3, 2)21) and its execution over a 'O'-filled grid. We show the portion of the 
grid that is returned as the output array. Two of the six cells have not been 
visited by the machine. 



Step 3: 



Step 4: 



Step 5: 
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5. An approximation to the Universal Distribution 

We have run all machines in (4, 2)20 just as we have done before for 
deterministic 1-dimensional Turing machines [HI [25]. That is, considering 
the output of all different machines starting both in a 'O'-filled grid and in a 
'1 '-filled grid. 

We also used a reduced enumeration to avoid running certain trivial ma- 
chines whose behavior can be predicted from the transition table, as well as 
filters to detect non-halting machines before exhausting the entire runtime. 
In the reduced enumeration we considered only machines with an initial 
transition moving to the right and changing to a different state than the ini- 
tial and halting states. Machines moving to the initial state at the starting 
transition run forever, and machines moving to the halting state produce 
single-character output. So we reduce the number of initial transitions in 
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{n,m)2D to m{n — 1) (the machine can write any of the m symbols and 
change to any state in {2, ■ ■ ■ , n}). The set of different machines is reduced 
accordingly to k{n — l)(4nm + m)"™~^. To enumerate these machines we 
construct a mixed-radix number, given that the digit corresponding to the 
initial transition now goes from to m(?T, — 1) — 1. To the output obtained 
when running this reduced enumeration we add the single- character arrays 
that correspond to machines moving to the initial state at the starting tran- 
sition. These machines and their output can be easily quantified. Also, to 
take into account machines with the initial transition moving in a different 
direction than the right one, we consider the 90, 180 and 270 degree rotations 
of the strings produced, given that for any machine moving up (left/down) 
at the initial transition, there is another one moving right that produces the 
identical output but rotates -90 (-180/-270) degrees. 

5.1. Setting the runtime 

The Busy Beaver runtime value for (4, 2) is 107 steps before halting. 
But no Busy Beavers are known for 2-dimensional Turing machines. So 
to set the runtime in our experiment we generated a sample of 334 x 10^ 
random machines in the reduced enumeration. We used a runtime of 2000 
steps for the runtime sample, this is 10.6% of the machines in the reduced 
enumeration for (4, 2)20, but 1500 steps for running all (4, 2)2-D. These 
machines were generated instruction by instruction. As we have explained 
above, it is possible to assign a natural number to every instruction. So 
to generate a random machine in the reduced enumeration for {n, 771)20 we 
produce a random number from to m(n — 1) — 1 for the initial transition 
and from to 4nm + m — 1 for the other nm — 1 transitions. We used the 
implementation of the Mersenne Twister in the Boost C++ library. The 
output of this sample was the distribution of the runtime of the halting 
machines. 

Fig. [2] shows the probability that a random halting machine will halt in 
at most the number of steps indicated on the horizontal axis. For 100 steps 
this probability is 0.9999995273. Note that the machines in the sample are 
in the reduced enumeration, a large number of very trivial machines halting 
in just one step having been removed. So in the complete enumeration the 
probability of halting in at most 100 steps is even greater. 

But we found some high runtime values — precisely 23 machines required 
more than 1000 steps. The highest value was a machine progressing through 
1483 steps before halting. So we have enough evidence to believe that by 
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Figure 2: Accumulated runtime distribution for {A, 2)20- 



setting the runtime at 2000 steps we have obtained almost all (if not all) 
output arrays. We ran all 6 x 34^ Turing machines in the reduced enumeration 
for (4, 2)20- Then we applied the completions explained before. 

6. Output Analysis 

The final output represents the result of 2(4nm + m)^ executions (all ma- 
chines in (4,2)20 starting with both blank symbols '0' and '1'). We found 
3 079 179 980 224 non-halting machines and 492 407 829 568 halting machines. 
A number of 1068 618 different binary arrays were produced after 12 days 
of calculation with a supercomputer of medium size (a 25 x86-64 CPUs run- 
ning at 2128 MHz each with 4 GB of memory each, located at the Centro 
Informatico Cientifico de Andalucfa (CICA), Spain. 

Let -D(4, 2)20 be the set constructed by dividing the occurrences of each 
different array by the number of halting machines as a natural extension of 
Eq. |6]for 2-dimensional Turing machines. Then, for every string s, 

i^™,2D(s) = -log2(D(4,2)(s)) (7) 

using the Coding theorem (Eq. [3]). Fig. [3] shows the top 36 objects in 
D(4,2)2D, that is the objects with lowest Kolmogorov complexity values. 

6.1. Evaluating 2-dimensional Kolmogorov complexity 

D{4:, 2)20 denotes the frequency distribution (a calculated Universal Dis- 
tribution) from the output of deterministic 2-dimensional Turing machines, 
with associated complexity measure Km,2D- -0(4, 2)20 distributes 1068 618 
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Figure 3: The top 36 objects in D{4, 2)20 preceded by their K,n^2D values, sorted by 
higher to lower frequency and therefore from smaller to larger Kolmogorov complexity 
after application of the Coding theorem). Only non-symmetrical cases are displayed. 



Figure 4: Frequency of appearance of "checkerboard" patterns sorted from more to less 
frequent according to -D(4. 2)20 (displayed only non-symmetrical cases under rotation and 
complementation). The checkerboard of size 4x4 doesn't occur. However, all 3 x 3 as 
seen in Fig. [8} including the "checkerboard" pattern of size 3 x 3 do occur. 
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arrays into 1272 different complexity values, with a minimum complexity 
value of 2.22882 bits (an explanation of non-integer program-size complex- 
ity is given in [25j and |l26j), a maximum value of 36.2561 bits and a mean 
of 35.1201. Considering the number of possible square binary arrays given 
by the formula 2'^^°' (without considering any symmetries), D{4,2)2d can 
be said to produce all square binary arrays of length up to 3 x 3, that is 
square arrays, and 60016 of the 2^^^^) = 65536 square ar- 
rays with side of length (or dimension) d = 4. It only produces 84104 of 
the 33 554432 possible square binary arrays of length d = 5 and only 11328 
of the possible 68 719476 736 of dimension d = 6. The largest square array 
produced in Z)(4, 2)2_d is of side length d = 13 (see Left of Fig. [s]) out of a 
possible 748 x 10"^^; it has a Km,2D value equal to 34.2561. 




Figure 5: Left: The only and most complex square array (with 15 other symmetrical cases) 
in D(4, 2)20 with Km.,2D = 34.2561. Another way to see this array is as one among those 
of length 13 with low complexity given that it occurred once in the sampled distribution 
in the classification unlike all other square arrays of the same size that are missing in 
-D(4, 2)2£). Right: With a value of Km,2D = 6.7 this is the simplest 4x4 square array 
after the preceding all-blank 4x4 array (with Km,2D = 6.4) and before the 4x4 square 
array with a black cell in one of the array corners (with complexity Km,2D = 6.9). 

What one would expect from a distribution where simple patterns are 
more frequent (and therefore have lower Kolmogorov complexity after appli- 
cation of the Coding theorem) would be to see patterns of the "checkerboard" 
type with high frequency and low random complexity (K), and this is exactly 
what we found (see Fig. |4]), while random looking patterns were found at the 
bottom among the least frequent ones (Fig. [6]). 

We have coined the informal notion of a "climber" as an object in the fre- 
quency classification (from greatest to lowest frequency) that appears better 
classified among objects of smaller size rather than with the arrays of their 
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Figure 6: Bottom 16 objects in the classification with lowest frequency, or being most 
random according to D(4, 2)20- It is interesting to note the strong similarities given that 
similar-looking cases are not always exact symmetries. The arrays are preceded by the 
number of occurrences of production from all the (4, 2)20 Turing machines. 
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size, this is in order to highlight possible candidates for low complexity, hence 
illustrating how the process make low complexity patterns to emerge. For 
example, "checkerboard" patterns (see Fig. |4]) seem to be natural "climbers" 
because they come significantly early (more frequent) in the classification 
than most of the square arrays of the same size. In fact, the larger the 
checkerboard array, the more of a climber it seems to be. This is in agree- 
ment with what we have found in the case of strings [32l [TH [25] where pat- 
terned objects emerge (e.g. (01)"", that is, the string 01 repeated n times), 
appearing relatively increasingly higher in the frequency classifications the 
larger n is, in agreement with the expectation that patterned objects should 
also have low Kolmogorov complexity. 
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An attempt of a definition of a climber is a pattern P of size a x b with 
small complexity among all a x 6 patterns, such that there exists smaller 
patterns Q (say c x d, with cd < ah) such that Km{P) < K^iQ) < 
median{Km{all oh patterns)). 

For example. Fig. [7] shows arrays that come together among groups of 
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much shorter arrays, thereby demonstrating, as expected from a measure 
of randomness, that array — or string — size is not what determines complex- 
ity (as we have shown before in [32| [25] for binary strings). The fact 
that square arrays may have low Kolmogorov complexity can be understood 
in several ways, some of which strengthen the intuition that square arrays 
should be less Kolmogorov random, such as for example, the fact that for 
square arrays one only needs the information of one of its dimensions to 
determine the other, either height or width. 

Fig. [7] shows cases in which square arrays are significantly better classified 
towards the top than arrays of similar size. Indeed, 100% of the squares of 
size 2x2 are in the first fifth (Fl), as are the 3x3 arrays. Square arrays 
of 4 X 4 are distributed as follows when dividing (4, 2)20 in 5 equal parts: 
72.66%, 15.07%, 6.17359%, 2.52%, 3.56%. 



7. Validation of the Coding theorem method by Compressibility 

One way to validate our method based on the Coding theorem (Eq. |3]) is 
to attempt to measure its departure from the compressibility approach. This 
cannot be done directly, for as we have explained, compression algorithms 
perform poorly on short strings, but we did find a way to partially circumvent 
this problem by selecting subsets of strings for which our Coding theorem 
method calculated a high or low complexity which were then used to generate 
a file of length long enough to be compressed. 

7.1. Comparison of and approaches based on compression 

It is also not uncommon to detect instabilities in the values retrieved by 



a compression algorithm for short strings, as explained in Section 2.1, strings 
which the compression algorithm may or may not compress. This is not 
a malfunction of a particular lossless compression algorithm (e.g. Deflate, 
used in most popular computer formats such as ZIP and PNG) or its imple- 
mentation, but a commonly encountered problem when lossless compression 
algorithms attempt to compress short strings. 

When researchers have chosen to use compression algorithms for reason- 
ably long strings, they have proven to be of great value, for example, for DNA 
false positive repeat sequence detection in genetic sequence analysis |23], in 
distance measures and classification methods [5, and in numerous other ap- 
plications [22]. However, this effort has been hamstrung by the limitations 
of compression algorithms-currently the only method used to approximate 
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Figure 8: Complete reduced set (non-symmetrical cases under reversion and complemen- 
tation) of square arrays of size 3 x 3 in Km,2D sorted from lowest to greatest Kolmogorov 
complexity after application of the Coding theorem (Eq.|3| to the output frequency of 2-D 
Turing machines. We denote this set by i^m, 21)3x3 • example, the 2 ghder configura- 
tions in the Game of Life come with high Kolmogorov complexity (with approximated 
values of 20.2261 and 20.5031). 
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the Kolmogorov complexity of a string-given that this measure is not com- 
putable. 

In this section we study the relation between Km and approaches to Kol- 
mogorov complexity based on compression. We show that both approaches 
are consistent, that is, strings with higher Km value are less compressible than 
strings with lower values. This is as much validation of Km and our Coding 
theorem method as it is for the traditional lossless compression method as 
approximation techniques to Kolmogorov complexity. The Coding theorem 
method is, however, especially useful for short strings where losses compres- 
sion algorithms fail, and the compression method is especially useful where 
the Coding theorem is too expensive to apply (long strings). 

7.1.1. Compressing strings of length 10 to 15 

For this experiment we have selected the strings in -D(5) with lengths 
ranging from 10 to 15. -D(5) is the frequency distribution of strings pro- 
duced by all 1-dimensional deterministic Turing machines as described in 
|25j . Table [l] shows the number of -D(5) strings with these lengths. Up to 
length 13 we have almost all possible strings. For length 14 we have a consid- 
erable number and for length 15 there are less than 50% of the 2^^ possible 
strings. The distribution of complexities is shown in Figure |9j 



Length (/) Strings 



10 


1024 


11 


2048 


12 


4094 


13 


8056 


14 


13068 


15 


14634 



Table 1: Number of strings of length 10 to 15 found in -D(5) 

As expected, the longer the strings, the greater their average complexity. 
The overlapping of strings with different lengths that have the same complex- 
ity correspond to climbers. The experiment consisted in creating files with 
strings of different i^'m-complexity but equal length (Files with more com- 
plex (random) strings are expected to be less compressible than files with less 
complex (random) strings). This was done in the following way. For each 
I (10 < / < 15), we let S{1) denote the list of strings of length /, sorted by 
increasing Km complexity. For each S{1) we made a partition of 10 sets with 
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Figure 9: Distribution of complexities for different string lengths (l). 



the same number of consecutive strings. Let's call these partitions P{l,p), 
1 <p < 10. 

Then for each P{l,p) we have created 100 files, each with 100 random 
strings in P{l,p) in random order. We called these files F{l,p,f), 1 < / < 
100. Summarizing, we now have: 

• 6 different string lengths /, from 10 to 15, and for each length 

• 10 partitions (sorted by increasing complexity) of the strings with 
length /, and 

• 100 files with 100 random strings in each partition. 

This makes for a total of 6 000 different files. Each file contains 100 
different binary strings, hence with length of 100 x / symbols. 

A crucial step is to replace the binary encoding of the files by a larger 
alphabet, retaining the internal structure of each string. If we compressed 
the files F{1, p, f) by using binary encoding then the final size of the resulting 
compressed files would depend not only on the complexity of the separate 
strings but on the patterns that the compressor discovers along the whole 
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fil^ To circumvent this we chose two different symbols to represent the '0' 
and '1' in each one of the 100 different strings in each file. The same set of 200 
symbols was used for all files. We were interested in using the most standard 
symbols we possibly could, so we created all pairs of characters from 'a' to 'p' 
(256 different pairs) and from this set we selected 200 two-character symbols 
that were the same for all files. This way, though we do not completely avoid 
the possibility of the compressor finding patterns in whole files due to the 
repetition of the same single character in different strings, we considerably 
reduce the impact of this phenomenon. 



1 = 10 



1 = 11 




123456789 10 
files sorted by increasing Km 



123456789 10 
files sorted by increasing Km 



Figure 10: Distribution of the compressed lengths of the files. 



•^We experimented with binary files and the result was quite different to what we present 
here. 
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The files were compressed using the Mathematica function Compress, 
which is an implementation of the Deflate algorithm (Lempel-Ziv plus Huff- 
man coding). Fig. 10 shows the distributions of lengths of the compressed 



files for the different string lengths. The horizontal axis shows the 10 groups 
of files in increasing Km- As the complexity of the strings grows (right part 
of the diagrams), the compressed files are larger, so they are harder to com- 
press. The relevant exception is length 15, but this is probably related to the 
low number of strings of that length that we have found, which are surely 
not the most complex strings of length 15. 

We have used other compressors such as GZIP (which uses Lempel-Zip 
algorithm LZ77) and BZIP2 (Burrows- Wheeler block sorting text compres- 
sion algorithm and Huffman coding), with several compression levels. The 
results are similar to those shown in Fig. 
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7.1.2. Comparing (4, 2)20 and (4,2) 



~n 1 1 1 — 

10 15 20 25 

One-dimensional complexity 



Figure 11: Scatterplot of Km with 2-dimensional Turing machines as a function of K„ 
with 1-dimcnsional Turing machines. 



We shall now look at how 1-dimensional arrays (hence strings) produced 
by 2D Turing machines correlate with strings that we have calculated before 
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[521 [m 125] (denoted by -D(5)). In a sense this is like changing the Turing 
machine formahsm to see whether the new distribution resembles distribu- 
tions following other Turing machine formalisms, and whether it is robust 
enough. 

All Turing machines in (4, 2) are included in (4, 2)20 because these are 
just the machines that do not move up or down. We first compared the 
values of the 1832 output strings in (4, 2) to the 1-dimensional arrays found 
in (4, 2) 2D- We are also interested in the relation between the ranks of these 
1832 strings in both (4,2) and (4,2)2d- 




12.0 12.5 13.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 





27.5 28.5 29.5 

One-dimensional compiexity 




28.0 28.5 29.0 29.5 
One-dimensionai compiexity 




29.0 29.4 29.8 

One-dimensionai compiexity 



Figure 12: Scatterplot of Km with 2-dimensional Turing machines as a function of 
with l-dimensional Turing machines by length of strings, for strings of length 5 to 13 
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Fig. 11 shows the hnk between Km,2D with 2D Turing machines as a 
function of ordinary Km,iD (that is, simply Km as defined in [25]). It sug- 
gests a strong almost-hnear overall association. The correlation coefficient 
r = 0.9982 confirms the linear association, and the Spearman correlation 
coefficient = 0.9998 proves a tight and increasing functional relation. 



The length / of strings is a possible confounding factor. However Fig. 12 



suggests that the link between one and 2-dimensional complexities is not 
explainable by /. Indeed, the partial correlation ^^x^ i^^™, ' — 0.9936 still 
denotes a tight association. 

Fig. 12 also suggests that complexities are more strongly linked with 
longer strings. This is in fact the Table [2] shows: the strength of the 

link increases with the length of the resulting strings. One and 2-dimensional 
complexities are remarkably correlated and may be considered two measures 
of the same underlying feature of the strings. How these measures vary is 
another matter. The regression of Km,2D on Km,iD gives the following ap- 
proximate relation: Km,2D ~ 2.64 + l.llKm,iD- Note that this subtle depar- 
ture from identity may be a consequence of a slight non-linearity, a feature 



visible in Fig. 11 



Table 2: Correlation coefficients between one and 2-dimensional complexities by length of 
strings. 



Length (/) 


Correlation 


5 


0.9724 


6 


0.9863 


7 


0.9845 


8 


0.9944 


9 


0.9977 


10 


0.9952 


11 


1 


12 


1 



7.2. Comparison of Km and compression of Cellular Automata 

A 1-dimensional CA can be represented by an array of cells xi where 
i G Z (integer set) and each x takes a value from a finite alphabet S. Thus, 
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a sequence of cells {xj} of finite length n describes a string or global configu- 
ration c on S. This way, the set of finite configurations will be expressed as 
S". An evolution comprises a sequence of configurations {cj} produced by 
the mapping $ : — )■ S"; thus the global relation is symbolized as: 



$(c*) ^ c*+i (8) 

Where t represents time and every global state of c is defined by a sequence 
of cell states. The global relation is determined over the cell states in config- 
uration c* updated simultaneously at the next configuration c*"*"^ by a local 
function as follows: 

Wolfram [28j represents 1-dimensional cellular automata (CA) with two 
parameters {k,r) where k = |S| is the number of states, and r is the neigh- 
borhood radius. Hence this type of CA is defined by the parameters (2, 1). 
There are different neighborhoods (where n = 2r + 1) and A;'^" distinct 
evolution rules. The evolutions of these cellular automata usually have peri- 
odic boundary conditions. Wolfram calls this type of CA Elementary Cellular 
Automata (denoted simply by ECA) and there are exactly k^" = 256 rules of 
this type. They are considered the most simple cellular automata (and among 
the simplest computing programs) capable of great behavioral richness. 

1-dimensional ECA can be visualized in 2-dimensional space-time dia- 
grams where every row is an evolution in time of the ECA rule. By their 
simplicity and because we have a good understanding about them (e.g. at 
least one ECA is known to be capable of Turing universality [9l |28]) they 
are excellent candidates to test our measure Km,2D, being just as effective as 
other methods that approach ECA using compression algorithms [23] that 
have yielded the results that Wolfram obtained heuristically. 

7.3. Km,2D comparison with compressed ECA evolutions 

We have seen that our Coding theorem method with associated measure 
Km (or Km,2D this paper for 2D Kolmogorov complexity) is in agreement 
with bit string complexity as approached by compressibility, as we have re- 



ported in Section 7.1 



The Universal Distribution from Turing machines that we have calculated 
(D(4,2)2d) will help us to classify Elementary Cellular Automata. Classifi- 
cation of ECA by compressibility has been done before in [22] with results 
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that are in complete agreement with our intuition and knowledge of the com- 
plexity of certain EGA rules (and related to Wolfram's classification [28]). 
In [29] both classifications by simplest initial condition and random initial 
condition were undertaken, leading to a stable compressibility classification 
of EGAs. Here we followed the same procedure for both simplest initial con- 
dition (single black cell) and random initial condition in order to compare 
the classification to the one that can be approximated by using -D(4, 2)2d, as 
follows. 

We will say that the space-time diagram (or evolution) of an Elementary 
Gellular Automaton c after time t has complexity: 

K^,2D,ac')= Yl K^Mr) (10) 

^■6{c*}dxd 

That is, the complexity of a cellular automaton c is the sum of the complex- 
ities of the r arrays in the partition matrix {c^}dxd from breaking {c*} into 
square arrays of length d produced by the EGA after t steps. An example of 



a partition matrix of an EGA evolution is shown in Fig. 13 for EGA Rule 30 
and (i = 3 where r = 10. Notice that the boundary conditions for a partition 
matrix may require the addition of at most d — 1 empty rows or — 1 empty 



columns to the boundary as shown in Fig. 13 (or alternatively the dismissal 
of at most d—1 rows oi d — 1 columns) if the dimensions (height and width) 
are not multiples of in this case d = ?>. 

If the classification of all rules in EGA by Km,2D yields the same clas- 
sification obtained by compressibility, one would be persuaded that Km,2D 
is a good alternative to compressibility as a method for approximating the 
Kolmogorov complexity of objects, with the signal advantage that Km,2D can 
be applied to very short strings and very short arrays such as images. Be- 
cause all possible 2^ arrays of size 3x3 are present in Km,2D we can use this 
arrays set to try to classify all EGAs by Kolmogorov complexity using the 
Goding Theorem method. Fig Isj shows all relevant (non-symmetric) arrays. 



We denote by Km,2D3x3 ^^^^ subset from Km,2D- 



Fig. [15] displays the scatterplot of compression complexity against 
Km.2Dax3 calculated for every cellular automaton. It shows a positive link 
between the two measures. The Pearson correlation amounts to r = 0.8278, 
so the determination coefficient is = 0.6853. These values correspond to 
a strong correlation, although smaller than the correlation between 1- and 



2-dimensional complexities calculated in Section 7.1 
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Figure 13: Decomposing the evolution of Rule 30 (top) EGA after t — 6 steps into 10 
subarrays of length 3x3 (bottom) in order to calculate 203-^3 to approximate its 
Kolmogorov complexity. 



Concerning orders arising from these measures of complexity, they too are 
strongly linked, with a Spearman correlation of = 0.9200. The scatterplots 
(Fig. 15) show a strong agreement between the Coding theorem method and 
the traditional compression method when both are used to classify ECAs by 
their approximation to Kolmogorov complexity. 

The anomalies found in the classification of Elementary Cellular Au- 
tomata (e.g. Rule 77 being placed among ECA with high complexity ac- 
cording to Km,2D2,x3) is ^ limitation of Km,2D3^-j, itself and not of the Coding 
theorem method which for (i = 3 is unable to "see" beyond 3-bit squares 
using, which is obviously very limited. And yet the degree of agreement with 



compressibility is surprising (as well as with intuition, as a glance at Fig. 14 



shows, and as the distribution of ECAs starting from random initial condi- 
tions in Fig. 17 confirms). In fact an average ECA has a complexity of about 
20K bits, which is quite a large program-size when compared to what we 
intuitively gauge to be the complexity of each ECA, which may suggest that 
they should have smaller programs. However, one can think of D(4, 2)2_d3x3 
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Figure 14: All the first 128 EGAs (the other 128 are 0-1 reverted rules) starting from 
the simplest (black cell) initial configuration running for t = 36 steps, sorted from lowest 
to highest complexity according to -/^m. 21)3x3 • Notice that the same procedure can be 
extended for its use on arbitrary images. 



as attempting to reconstruct the evolution of each EGA for the given number 
of steps with square arrays only 3 bits in size, the complexity of the three 



26 



10000 20000 30000 40000 50000 

Two-dimensional complexity 



4 



10000 20000 30000 40000 50000 

Two-dimensional compiexity 



Figure 15: Scatterplots of compression complexity against i^m, 21)3x3 complexity as eval- 
uated on the 128 first EGA evolutions after t = 90 steps. The top plot also shows the 
distribution of points along the axes displaying some clusters. The bottom plot shows a 



few of the EGA rules used in Fig 17 (but here for a black cell initial condition). 



square arrays adding up to approximate Km,2D of the EGA rule. Hence it is 
the deployment of -D(4, 2) 2,03 x 3 that takes between 500 to 50K bits to recon- 
struct every EGA space-time evolution depending on how random vs. how 
simple it is. 
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Other ways to exploit the data from 0(4, 2)20 (e.g. non-square arrays) 
can be utihzed to explore better classifications. We think that constructing a 
Universal Distribution from a larger set of Turing machines, e.g. D{5, 2)2^4x4 
will deliver more accurate results but here we will also introduce a tweak to 
the definition of the complexity of the evolution of a cellular automaton. 

7.4- Block Matrix Decomposition method 

One can think of an improvement in resolution of the complexity Km,2D{c) 



of a cellular automaton c (see Eq. 10) by taking the log2(n) of the sum of the 
arrays where n is the number of repeated arrays, instead of simply adding the 
complexity of the 3x3 arrays. That is, one penalizes repetition as a "lens" 
to improve the resolution of Km,2D approximated by the Coding theorem 
method. This is possible because the Kolmogorov complexity of repeated 
objects grows by log2(r;,), just as we explained with an example in Section |2] 
Adding the complexity approximation of each array in the partition matrix 
of a space-time diagram of an EGA provides an upper bound on the EGA 
Kolmogorov complexity, as it shows that there is a program that generates 
the EGA evolution picture with the length equal to the sum of the programs 
generating all the sub-arrays (plus a small length corresponding to the code 
length to join the sub-arrays). So if a sub-array appears several times we 
don't need to consider it's complexity twice, but log2(2). Taking into account 



this, Eq. 10 can be then rewritten (denoted by Klog and denominated Block 



Matrix Decomposition method) as: 

i^%m,2Ddxd(c*) = X] ^'^S2i'n')^m,2D{ru) + K.rn^2D{ru) (H) 

Where Vu are the different square arrays in the partition matrix {c^}dxd and 
n in log2(n) the number of square arrays in {c^}dxd equal to r„. 

Now complexity values of Klogm,2Di^d range between 73 to 3K bits with 
a mean program-size value of 867 bits. The classification of EGA, according 



to Eq. 11, is presented in Fig. 11 There is an almost perfect agreement with 



a classification by lossless compression length (see Fig. 16 and 17) which 
makes even one wonder whether the Goding theorem method is actually 
providing more accurate approximations to Kolmogorov complexity through 
Km than lossless compressibility itself. Notice that the same procedure can 
be extended for its use on arbitrary images. We denominate this technique 
based in the Goding theorem method Block Matrix Decomposition method. 
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We think it will prove to be useful in various areas, including machine learning 
(ML) as an of Kolmogorov complexity (other contributions to ML inspired 
in Kolmogorov complexity can be found in |17j). 

Also worth notice that the fact that EGA can be successfully classified by 
Km,2D with an approximation of the Universal Distribution calculated from 
Turing machines (TM) suggests that output frequency distributions of EGA 
and TM cannot be but strongly correlated, something that we had found and 
reported before in |3T] and [13]. 

Another variation of the same Km,2D measure is to divide the original 
image into all possible square arrays of a given length rather than taking a 
partition. This would, however, be exponentially more expensive than the 



partition process alone, and given the results in Fig. [16] further variations 
don't seem to be needed, at least not for this case. 

7.5. Robustness of the approximations to m(s) 

One important question that arises when positing the soundness of the 
Goding theorem method as an alternative to having to pick a universal Tur- 
ing machine to evaluate the Kolmogorov complexity K of an object, is how 
many arbitrary choices are made in the process of following one or another 
method and how important they are. One of the motivations of the Goding 
theorem method is to deal with the constant involved in the Invariance the- 
orem (Eq. [2]), which depends on the (prefix-free) universal Turing machine 
chosen to measure K and which has such an impact on real-world applica- 
tions involving short strings. While the constant involved remains, given that 
after application of the Goding theorem (Eq. [3]) we reintroduce the constant 
in the calculation of K, a legitimate question to ask is what difference it 
makes to follow the Goding theorem method compared to simply picking the 
universal Turing machine. 

On the one hand, one has to bear in mind that no other method existed 
for approximating the Kolmogorov complexity of short strings. On the other 
hand, we have tried to minimize any arbitrary choice, from the formalism of 
the computing model to the informed runtime, when no Busy Beaver values 
are known and therefore sampling the space using an educated runtime cut- 
off is called for. When no Busy Beaver values are known the chosen runtime 
is determined according to the number of machines that we are ready to miss 
(e.g. less than .01%) for our sample to be significative enough as described in 



Section 5.1 We have also shown in [25J that approximations to the Universal 
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Figure 16: Block Matrix Decomposition method. All the first 128 EGAs (the other 128 
are 0-1 reverted rules) starting from the simplest (black cell) initial configuration running 
for t = 36 steps, sorted from lowest to highest complexity according to Klog as defined in 
Eq.[TT} 
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Figure 17: Side by side comparison of 8 evolutions of representative EGAs, starting from a 
random initial configuration, sorted from lowest to highest Klog values (top) and smallest 
to largest compression lengths using the Defiate algorithm as a method to approximate 
Kolmogorov complexity j29j . 



Distribution from spaces for which Busy Beaver values are known are in 
agreement with larger spaces for which Busy Beaver values are not known. 

Among the possible arbitrary choices it is the enumeration that may per- 
haps be questioned, that is, calculating D{n) for increasing n (number of 
Turing machine states), hence by increasing size of computer programs (Tur- 
ing machines). On the one hand, one way to avoid having to make a decision 
on the machines to consider when calculating a Universal Distribution is to 
cover all of them for a given number of n states and m symbols, which is what 
we have done (hence the enumeration in a thoroughly [n, m) space becomes 
irrelevant). While it may be an arbitrary choice to fix n and m, the for- 
malisms we have followed guarantee that ra-state m-symbol Turing machines 
are in {n + i,m + j) with i,j>0 (that is, the space of all n + i-state m + j- 
symbol Turing machines). Hence the process is incremental, taking larger 
spaces and constructing an average Universal Distribution. In fact, we have 
demonstrated [23] that -D(5) (that is, the Universal Distribution produced 
by the Turing machines with 2 symbols and 5 states) is strongly correlated 
to -D(4) and represents an improvement in accuracy of the string complex- 
ity values in -D(4), which in turn is in agreement with and an improvement 
on -D(3) and so on. We have also estimated the constant c involved in the 
invariance theorem (Eq. [2]) between these D{n) for n > 2, which turned out 
to be very small in comparison to all the other calculated Universal Distri- 
butions [26] . The invariance theorem guarantees that this c will eventually 
tend to zero for n — )■ oo. However, it doesn't indicate the rate of the conver- 
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gence, but the fact that c remains very small among the calculated Universal 
Distributions is a good sign. 

On the other hand, we have shown before p6j and in this paper that 
the method is stable in the face of the changes in Turing machine formalism 
that we have undertaken. We have changed all the parameters in the Turing 
machine model we could in order to compare the impact of these parameters 
in the formalism to modify the estimation of the Universal Distribution, 
that is the number of states, but also the number of symbols and even the 
directions of the head of the Turing machine as shown herein. We have shown 
and reported here and in [2S1 12S] that all these changes yield distributions 
that are strongly correlated with each other up to the point to assert that all 
these parameters have marginal impact in the final distributions. In |3T] we 
also proposed a way to compare approximations to the Universal Distribution 
by completely different computational models (e.g. Post tag systems and 
cellular automata), showing that for the studied cases reasonable estimations 
with different degrees of correlations were produced. 

8. Conclusions 

We have shown that the results of Km,2D are in agreement with compress- 
ibility when applied to a concatenation of strings of the same complexity, and 
to classifications of Elementary Cellular Automata (ECA) by compressibil- 
ity. However, Km,2D (like our in [25]) is finer-grained compared to the 
number of different classes provided by compressibility. As a consequence 
the Coding theorem method seems to have the advantage over lossless com- 
pression algorithms to better distinguish complexity of small objects. For 
example, among the ECA, -ft'm,2D3x3 provided 51 different complexity values, 
while compression retrieved 45 different values. These are not very different 
from each other but the smaller the object the more important it is to have 
a greater number of complexity values in order to distinguish the complexity 
of one string from another. For short enough objects compression algorithms 
collapse all complexity values into a random value (the compressed length 
of short objects is larger than the original object itself) as seen in [25j. To- 
gether with the evidence we have provided in this paper, our measure and 
the Coding theorem method we are advancing here look sound and ready 
for applications, ready to serve as a complement to the compression method, 
especially where compression fails. 
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With two different experiments we have demonstrated that our measure 
is compatible with compression, yielding similar results but providing an 
alternative method — to compression — for short strings, that is the Coding 
theorem method. We have also shown that Km,2D (and Km) are ready for 
applications, and that calculating Universal Distributions is a stable alterna- 
tive to compression and a worthwhile tool for approximating the Kolmogorov 
complexity of objects, strings and images (arrays). We think this method 
will prove to do the same for a wide range of areas where compression isn't 
an option given the size of strings involved. 

We also introduced the Block Matrix Decomposition method. As we have 
seen with anomalies in the classification such as EGA Rule 77 (see Fig. 14), 
when approaching the complexity of the space-time diagrams of EGA by 
splitting them in square arrays of size 3, the Goding theorem method does 
have its limitations, especially because it is computationally very expensive 
(although the most expensive part needs to be done only once — that is, pro- 
ducing an approximation of the Universal Distribution). Like other high 
precision instruments for examining the tiniest objects in our world, mea- 
suring the smallest complexities is very expensive, just as the compression 
method can also be very expensive for large amounts of data. 

Splitting EGA rules in array squares of size 3 is like trying to look through 
little windows 9 pixels wide one at a time in order to recognize a face, or 
training a microscope on a planet in the sky. One can do better with the 
Goding theorem method by going further than we have in the calculation of 
a 2-dimensional Universal Distribution (e.g. calculating in full or a sample 
of -D(5, 2)204x4)5 but eventually how far this process can be taken is dictated 
by the computational resources at hand. Nevertheless, one should use a 
telescope where telescopes are needed and a microscope where microscopes 
are needed. 

We have made available to the community "a microscope" in the form 
of the Online Algorithmic Gomplexity Galculator implementing Km (in the 
future it will also implement Km,2D and many other objects and a wider range 
of methods) that provides objective complexity estimations for short binary 
strings. It can be visited at http: //www. complexitycalculator . com where 
the raw data for this paper can also be found. 
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